52 research outputs found
Unsupervised Training for 3D Morphable Model Regression
We present a method for training a regression network from image pixels to 3D
morphable model coordinates using only unlabeled photographs. The training loss
is based on features from a facial recognition network, computed on-the-fly by
rendering the predicted faces with a differentiable renderer. To make training
from features feasible and avoid network fooling effects, we introduce three
objectives: a batch distribution loss that encourages the output distribution
to match the distribution of the morphable model, a loopback loss that ensures
the network can correctly reinterpret its own output, and a multi-view identity
loss that compares the features of the predicted 3D face and the input
photograph from multiple viewing angles. We train a regression network using
these objectives, a set of unlabeled photographs, and the morphable model
itself, and demonstrate state-of-the-art results.Comment: CVPR 2018 version with supplemental material
(http://openaccess.thecvf.com/content_cvpr_2018/html/Genova_Unsupervised_Training_for_CVPR_2018_paper.html
Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervision
We address efficient and structure-aware 3D scene representation from images.
Nerflets are our key contribution -- a set of local neural radiance fields that
together represent a scene. Each nerflet maintains its own spatial position,
orientation, and extent, within which it contributes to panoptic, density, and
radiance reconstructions. By leveraging only photometric and inferred panoptic
image supervision, we can directly and jointly optimize the parameters of a set
of nerflets so as to form a decomposed representation of the scene, where each
object instance is represented by a group of nerflets. During experiments with
indoor and outdoor environments, we find that nerflets: (1) fit and approximate
the scene more efficiently than traditional global NeRFs, (2) allow the
extraction of panoptic and photometric renderings from arbitrary views, and (3)
enable tasks rare for NeRFs, such as 3D panoptic segmentation and interactive
editing.Comment: accepted by CVPR 202
OpenScene: 3D Scene Understanding with Open Vocabularies
Traditional 3D scene understanding approaches rely on labeled 3D datasets to
train a model for a single task with supervision. We propose OpenScene, an
alternative approach where a model predicts dense features for 3D scene points
that are co-embedded with text and image pixels in CLIP feature space. This
zero-shot approach enables task-agnostic training and open-vocabulary queries.
For example, to perform SOTA zero-shot 3D semantic segmentation it first infers
CLIP features for every 3D point and later classifies them based on
similarities to embeddings of arbitrary class labels. More interestingly, it
enables a suite of open-vocabulary scene understanding applications that have
never been done before. For example, it allows a user to enter an arbitrary
text query and then see a heat map indicating which parts of a scene match. Our
approach is effective at identifying objects, materials, affordances,
activities, and room types in complex 3D scenes, all using a single model
trained without any labeled 3D data.Comment: CVPR 2023. Project page: https://pengsongyou.github.io/openscen
Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation
Computer vision models learn to perform a task by capturing relevant
statistics from training data. It has been shown that models learn spurious
age, gender, and race correlations when trained for seemingly unrelated tasks
like activity recognition or image captioning. Various mitigation techniques
have been presented to prevent models from utilizing or learning such biases.
However, there has been little systematic comparison between these techniques.
We design a simple but surprisingly effective visual recognition benchmark for
studying bias mitigation. Using this benchmark, we provide a thorough analysis
of a wide range of techniques. We highlight the shortcomings of popular
adversarial training approaches for bias mitigation, propose a simple but
similarly effective alternative to the inference-time Reducing Bias
Amplification method of Zhao et al., and design a domain-independent training
technique that outperforms all other methods. Finally, we validate our findings
on the attribute classification task in the CelebA dataset, where attribute
presence is known to be correlated with the gender of people in the image, and
demonstrate that the proposed technique is effective at mitigating real-world
gender bias.Comment: To appear in CVPR 202
- …